2025-01-06
| Feature | Bag-of-Words | Static Embeddings | Transformers |
|---|---|---|---|
| Core Idea | Word count representation | Dense, fixed-size vectors capturing word semantics. | Context-aware embeddings using attention mechanisms |
| Representation | Sparse vectors (e.g., one-hot encoding, counts) | Dense vectors (e.g., 300 dimensions). | Contextualized vectors generated dynamically |
| Semantics | None (words treated independently) | Semantic similarity but context-independent. | Semantic and context-aware understanding |
| Polysemy Handling | Cannot distinguish between meanings | Single vector for all senses of a word. | Context-aware disambiguation |
| Methods | TF-IDF, word counts, scaling methods | Word2Vec, GloVe, FastText. | LLMs: BERT, GPT, Llama & Co |
| Weaknesses | Ignores order and context, sparse | Context-blind, limited for nuanced tasks. | Computationally expensive, requires large data |
A simple one-layered NN
Oftentimes deep learning & NNs used interchangeably
Deep refers to the number of layers in the network (i.e. the depth)
Weights: values that define the strength of the connection between two neurons in adjacent layers of a neural network
Biases: parameters that allow the network to shift the activation function up or down
Activation Function: decides whether a neuron should “fire” (pass its signal forward) and how strong that signal should be
Also a NN, state of the art since early 2018
“Attention is all you need” (Vaswani 2017)
Attention Mechanism:
Causally predicting the next word
| BERT | GPT | |
|---|---|---|
| Type | Encoder-only | Decoder-only |
| Training | Masked Language Modeling (MLM) | Causal Language Modeling (CLM) |
| Direction | Bi-directional | Uni-directional |
| Task Focus | Language understanding | Language generation |
| Strength | Deep sentence/context understanding | Coherent and fluent text generation |
Developed by OpenAI